As mentioned above, the main data has been sourced from the opendata.swiss project, having been collected and published by the Open Data Portal of the City Council of Zürich, under the name “Hundebestände der Stadt Zürich, seit 2015”. The description of the data set from the original source is as follows:
This dataset contains information on dogs and their owners from the municipal dog register since 2015. Information on the age group, gender and statistical district of residence is provided for dog owners. The breed, breed type, sex, year of birth, age and color are recorded for each dog. The dog register is kept by the Dog Control Department of the Zurich City Police.
For the sake of a seamless workflow and easier interpretation of the variables within our group, the names of columns as well as certain string values have been translated to English from the original German version.
The main source of data is the kul100od1001.csv file,
which contains a collection of 70,967 listings with 33 variables.
dim(df.dogs)
## [1] 70967 15
str(df.dogs)
## 'data.frame': 70967 obs. of 15 variables:
## $ ReferenceYear : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
## $ OwnerId : int 126 574 695 893 1177 4004 4050 4155 4203 4215 ...
## $ AgeV10Text : chr "60- bis 69-Jährige" "60- bis 69-Jährige" "40- bis 49-Jährige" "60- bis 69-Jährige" ...
## $ OwnerSexText : chr "männlich" "weiblich" "männlich" "weiblich" ...
## $ DistrictText : chr "Kreis 9" "Kreis 2" "Kreis 6" "Kreis 7" ...
## $ QuarterText : chr "Altstetten" "Leimbach" "Oberstrass" "Fluntern" ...
## $ Breed1Text : chr "Welsh Terrier" "Cairn Terrier" "Labrador Retriever" "Mittelschnauzer" ...
## $ Breed2Text : chr "Keine" "Keine" "Keine" "Keine" ...
## $ MixedBreedText: chr "Rassehund" "Rassehund" "Rassehund" "Rassehund" ...
## $ BreedTypeLong : chr "Kleinwüchsig" "Kleinwüchsig" "Rassentypenliste I" "Rassentypenliste I" ...
## $ DogBirthYear : int 2011 2002 2012 2010 2011 2010 2012 2002 2005 2001 ...
## $ DogAgeCoded : int 3 12 2 4 3 4 2 12 9 13 ...
## $ DogSexText : chr "weiblich" "weiblich" "weiblich" "weiblich" ...
## $ DogColorText : chr "schwarz/braun" "brindle" "braun" "schwarz" ...
## $ NumberOfDogs : int 1 1 1 1 1 1 1 1 1 1 ...
As can be seen in the structure of the data, the set comprises several observations of diverse data types. Most variables are expressed three times as different types, as integers (Coded and Sort form), as well as strings (Text). Depending on their implementation in the study they have been selected in one of the three variants, therefore our selection of relevant observations can be summarized as follows:
Numerical values:
ReferenceYear: numerical value for the reference
yearOwnerId: numerical identifier for the owner of the
registered dogAgeV10Sort: referring to the owner’s age as a 10-year
categoryDogBirthYear: numerical value for the birth year of the
dogDogAgeSort: referring to the dog’s age at the time of
registrationNumberOfDogs: numerical counter of the dog count for
each dog ownerBinary variables: !!! Is breed multinomial or factor? !!!
DogSexText: numerical value indicating two states for
the biological sex of the dogString values:
DistricText: the name of each larger district of Zürich
according to the official divisionQuarterText: the name of the smaller neighbourhoods
which comprise the larger districtsBreed1Text and Breed1Text2: referring to
dog race denominations and informationMixedBreedText: additional information regarding race
mixing in the dogDogColorText: a descriptive name for the colour of the
dogBreedTypeLong: referring to the official dog type
classification according to the Zürich
Cantonal LawThe original data set has been complemented with the GEOJSON file
stzh.adm_stadtkreise_a.geojson for the production of map
plots, by merging both data sets with the district name variables, as
convened by the City Council of Zürich.
Considering that the dataset predominantly consists of categorical observations with minimal quantitative variables, our approach involves segmenting the exploratory analysis into inquiries centered around various count-based groupings. Subsequently, we will match specific models from our study to the research questions and variables that are best suited for their respective capabilities. The following insights and plots offer a glimpse into the dataset, unveiling potential research avenues to explore.
ggplotly(fit01_ggplot)
## `geom_smooth()` using formula = 'y ~ x'
lm.counts.year <- lm(DogCount ~ ReferenceYear * DistrictText,
data = dog_count_per_neighborhood_year)
summary(lm.counts.year)
##
## Call:
## lm(formula = DogCount ~ ReferenceYear * DistrictText, data = dog_count_per_neighborhood_year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.572 -16.100 -0.033 16.181 58.222
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -5.527e+03 7.289e+03
## ReferenceYear 2.800e+00 3.610e+00
## DistrictTextKreis 10 -3.312e+04 1.031e+04
## DistrictTextKreis 11 -1.047e+05 1.031e+04
## DistrictTextKreis 12 -3.837e+04 1.031e+04
## DistrictTextKreis 2 -9.000e+04 1.031e+04
## DistrictTextKreis 3 -5.168e+04 1.031e+04
## DistrictTextKreis 4 -2.623e+04 1.031e+04
## DistrictTextKreis 5 -3.056e+04 1.031e+04
## DistrictTextKreis 6 -3.980e+04 1.031e+04
## DistrictTextKreis 7 -7.468e+04 1.031e+04
## DistrictTextKreis 8 -2.999e+04 1.031e+04
## DistrictTextKreis 9 -8.681e+04 1.031e+04
## DistrictTextUnbekannt (Stadt Zürich) 5.263e+03 1.170e+04
## ReferenceYear:DistrictTextKreis 10 1.670e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 11 5.245e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 12 1.922e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 2 4.488e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 3 2.588e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 4 1.313e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 5 1.520e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 6 1.992e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 7 3.747e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 8 1.500e+01 5.106e+00
## ReferenceYear:DistrictTextKreis 9 4.342e+01 5.106e+00
## ReferenceYear:DistrictTextUnbekannt (Stadt Zürich) -2.668e+00 5.798e+00
## t value Pr(>|t|)
## (Intercept) -0.758 0.450412
## ReferenceYear 0.776 0.440152
## DistrictTextKreis 10 -3.213 0.001858 **
## DistrictTextKreis 11 -10.157 2.49e-16 ***
## DistrictTextKreis 12 -3.722 0.000354 ***
## DistrictTextKreis 2 -8.731 1.89e-13 ***
## DistrictTextKreis 3 -5.013 2.89e-06 ***
## DistrictTextKreis 4 -2.545 0.012742 *
## DistrictTextKreis 5 -2.965 0.003933 **
## DistrictTextKreis 6 -3.861 0.000220 ***
## DistrictTextKreis 7 -7.244 1.83e-10 ***
## DistrictTextKreis 8 -2.910 0.004618 **
## DistrictTextKreis 9 -8.421 8.01e-13 ***
## DistrictTextUnbekannt (Stadt Zürich) 0.450 0.654061
## ReferenceYear:DistrictTextKreis 10 3.271 0.001550 **
## ReferenceYear:DistrictTextKreis 11 10.273 < 2e-16 ***
## ReferenceYear:DistrictTextKreis 12 3.764 0.000307 ***
## ReferenceYear:DistrictTextKreis 2 8.791 1.43e-13 ***
## ReferenceYear:DistrictTextKreis 3 5.070 2.30e-06 ***
## ReferenceYear:DistrictTextKreis 4 2.572 0.011839 *
## ReferenceYear:DistrictTextKreis 5 2.977 0.003790 **
## ReferenceYear:DistrictTextKreis 6 3.901 0.000191 ***
## ReferenceYear:DistrictTextKreis 7 7.338 1.19e-10 ***
## ReferenceYear:DistrictTextKreis 8 2.938 0.004252 **
## ReferenceYear:DistrictTextKreis 9 8.504 5.46e-13 ***
## ReferenceYear:DistrictTextUnbekannt (Stadt Zürich) -0.460 0.646508
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.96 on 85 degrees of freedom
## Multiple R-squared: 0.9953, Adjusted R-squared: 0.9939
## F-statistic: 718.6 on 25 and 85 DF, p-value: < 2.2e-16
# Display summary of the model
#summary(binomial_model)